Network analysis

Amelia McNamara

August 19, 2016

Networks

In the 1930s, Jacob Moreno was thinking about the “social atom” and making “sociograms” (network diagrams).

(Image via Mark Hansen)

More examples

Game of Thrones data

I am a little out of the cultural loop, so I don’t have a lot of contextual knowledge, but the hottest network analysis lately is about Game of Thrones, so…

library(readr)
storm <- read_csv("stormofswords.csv")
head(storm)
## # A tibble: 6 x 3
##   Source  Target Weight
##    <chr>   <chr>  <int>
## 1  Aemon   Grenn      5
## 2  Aemon Samwell     31
## 3  Aerys   Jaime     18
## 4  Aerys  Robert      6
## 5  Aerys  Tyrion      5
## 6  Aerys   Tywin      8

Working with networks

R packages:

Other ways:

Lets try igraph first

library(igraph)
library(GGally)
g <- graph.data.frame(storm)
plot(g)

Defaults are pretty ugly.

Making igraph a bit more readable

plot(g, edge.arrow.size=0.2, vertex.size=7, vertex.label.cex=0.7)

Changing the layout– circle

plot(g, edge.arrow.size=0.2, vertex.size=7, vertex.label.cex=0.7, layout=layout_in_circle)

Changing the layout – sphere

plot(g, edge.arrow.size=0.2, vertex.size=7, vertex.label.cex=0.7, layout=layout_on_sphere)

But really, we want the “random” network to be less collapsed

l <- layout_with_fr(g,niter=500)
plot(g, edge.arrow.size=0.2, vertex.size=7, vertex.label.cex=0.7, layout=l)

It can be good to save your layout separately so you don’t get a subtle variation every time you plot.

Curved edges

plot(g, edge.arrow.size=0.2, vertex.size=7, vertex.label.cex=0.7, layout=l, edge.curved=TRUE)

What about GGally?

Lots of dependencies necessary.

library(GGally)
library(network)
library(sna)
library(intergraph)
ggnet2(g, label=TRUE)

But, defaults are prettier.

Weighted edges

ggnet2(g, label=TRUE, edge.size = "Weight", edge.alpha=0.5)

AAAAA

Okay, lets do some rescaling

library(dplyr)
storm <- storm %>%
  mutate(ScaledWeight = Weight / max(Weight))
g <- graph.data.frame(storm)
ggnet2(g, label=TRUE, edge.size = "ScaledWeight", node.size = 4)

Now I think we’ve gone too far! You will have to figure out what makes sense.

ggraph

# We're going off the rails a bit here
# library(devtools)
# install_github('hadley/ggplot2')
# install_github('thomasp85/ggforce')
# install_github("thomasp85/ggraph")
library(ggraph)
ggraph(g, 'igraph',algorithm = 'kk') +
  geom_edge_fan(aes(alpha = ..index..)) + 
  geom_node_point() +
  ggforce::theme_no_axes()

Networks in d3 – the lazy way!

library(networkD3)
gd <- get.data.frame(g)
simpleNetwork(gd)

There is more to networks than just node-edge diagrams

Lots of “centralities”

Eigenvector Centrality

ec <- eigen_centrality(g)$vector
ec <- data.frame(ec)
ec$names <- rownames(ec)
ec %>%
  arrange(desc(ec)) %>%
  slice(1:10) %>%
  ggplot() + geom_bar(aes(x=reorder(names, ec), y=ec), stat="identity")

Looks like what Andre Beveridge and Jie Shan got!

Betweenness centrality

bc <- betweenness.estimate(g, cutoff=10)
bc <- data.frame(bc)
bc$names <- rownames(bc)
bc %>% 
  arrange(desc(bc)) %>%
  slice(1:10)
##           bc   names
## 1  332.97460  Tyrion
## 2  244.63571 Samwell
## 3  226.20476 Stannis
## 4  208.62302  Robert
## 5  138.66667   Mance
## 6  119.99563   Jaime
## 7  114.33333  Sandor
## 8  111.26667     Jon
## 9   90.65000   Janos
## 10  64.59762   Aemon

Not quite what they got in the paper. Hmm.

Page rank

pageRank <- page_rank(g, damping = 0.15)$vector
head(as.matrix(sort(pageRank,decreasing = TRUE)))
##                [,1]
## Margaery 0.01282288
## Samwell  0.01238905
## Loras    0.01208873
## Drogo    0.01108128
## Qhorin   0.01097118
## Roslin   0.01088275

Again, not the same as the paper.

Adjacency matrix

p <- ggplot(storm, aes(x=Source, y=Target, fill=Weight)) +
  geom_raster() +
  theme_bw() +
  scale_x_discrete(drop = FALSE) +
  scale_y_discrete(drop = FALSE) +
  theme(
    # Rotate the x-axis lables so they are legible
    axis.text.x = element_text(angle = 270, hjust = 0),
    # Force the plot into a square aspect ratio
    aspect.ratio = 1,
    # Hide the legend (optional)
    legend.position = "none")
p

Whoa!

Reordering

# library(devtools)
# install_github("hadley/forcats")
library(forcats)
storm_ordered <- storm %>%
  left_join(ec, by = c("Source" = "names"))
storm_ordered <- storm_ordered %>%
  mutate(Source = fct_reorder(Source, ec),
         Target = fct_reorder(Target, ec))

# swap the data in the plot
p %+% storm_ordered

Change ordering to by betweenness centrality?

Light aside – Bertin matrices

Another one

In R

# install.packages("bertin",repos="http://r-forge.r-project.org")
library(bertin)
data(Hotel)
image.bertin(bertinrank(Hotel, ties.method="first"), main= "Hotel data")

plot.bertin(bertinrank(Hotel, ties.method = "first"), main= "Hotel data")

plot.bertin(Hotel, palette=c("white","black"))

On our data, doesn’t work well because it is so big!

mini_storm <- storm_ordered %>%
  filter(Source %in% ec$names[1:10],
         Target %in% ec$names[1:10]) %>%
  mutate(Source = droplevels(Source),
         Target = droplevels(Target)) %>%
  select(Source, Target, Weight)
g2 <- graph.data.frame(mini_storm)
mat <- get.adjacency(g2)
plot.bertin(bertinrank(mat, ties.method="first"))

Hmm…